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Abstract 

Computer simulation has become the standard tool in many engineering 
fields for designing and optimizing systems, as well as for assessing their relia¬ 
bility. Optimization and uncertainty quantification problems typically require 
a large number of runs of the computational model at hand, which may not be 
feasible with high-fidelity models directly. Thus surrogate models (a.k.a meta¬ 
models) have been increasingly investigated in the last decade. Polynomial 
Chaos Expansions (PCE) and Kriging are two popular non-intrusive meta¬ 
modelling techniques. PCE surrogates the computational model with a series 
of orthonormal polynomials in the input variables where polynomials are cho¬ 
sen in coherency with the probability distributions of those input variables. A 
least-square minimization technique may be used to determine the coefficients 
of the PCE. On the other hand, Kriging assumes that the computer model 
behaves as a realization of a Gaussian random process whose parameters are 
estimated from the available computer runs, i.e. input vectors and response 
values. These two techniques have been developed more or less in parallel so 
far with little interaction between the researchers in the two fields. In this 
paper, PC-Kriging is derived as a new non-intrusive meta-modeling approach 
combining PCE and Kriging. A sparse set of orthonormal polynomials (PCE) 
approximates the global behavior of the computational model whereas Krig¬ 
ing manages the local variability of the model output. An adaptive algorithm 
similar to the least angle regression algorithm determines the optimal sparse 
set of polynomials. PC-Kriging is validated on various benchmark analytical 
functions which are easy to sample for reference results. From the numeri¬ 
cal investigations it is concluded that PC-Kriging performs better than or at 
least as good as the two distinct meta-modeling techniques. A larger gain in 
accuracy is obtained when the experimental design has a limited size, which 
is an asset when dealing with demanding computational models. 

Keywords; Emulator - Gaussian process modeling ~ Kriging, meta¬ 
modelling ~ Polynomial Chaos Expansions ~ PC-Kriging - Sobol’ function 
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1 Introduction 


Modern engineering makes a large use of computer simulation in order to de¬ 
sign systems of ever increasing complexity and assess their performance. As 
an example, let us consider a structural engineer planning a new structure. 
An essential part of his work is to predict the behavior of the not-yet-built 
structure based on some assumptions and available information about e.g. ac¬ 
ceptable dimensions, loads to be applied to the structure and material prop¬ 
erties. These ingredients are basically the input parameters of computational 
models that predict the performance of the system under various conditions. 
The models help the engineer analyze/understand the behavior of the struc¬ 
ture and eventually optimize the design in order to comply with safety and 
serviceability constraints. 

Similar conditions can also be found in many other scientific and engi¬ 
neering disciplines. The common point is the simulation of the behavior of a 
physical process or system by dedicated algorithms which provide a numeri¬ 
cal solution to the governing equations. These simulation algorithms aim at 
reproducing the physical process with the highest possible fidelity. As an ex¬ 
ample, finite element models have become a standard tool in modern civil and 
mechanical engineering. Due to high fidelity, such models typically exploit the 
available computer power, meaning that a single run of the model may take 
hours to days of computing, even when using a large number of CPUs. 

An additional layer of complexity comes from the fact that most input 
parameters of such computational models are not perfectly known in prac¬ 
tice. Some parameters {e.g. material properties, applied loads, etc.) may 
exhibit natural variability so that the exact value to be used for simulating 
the behavior of a particular system is not known in advance (this is referred 
to as aleatory uncertainty). Some others may have a unique value which is 
however not directly measurable and prone to lack of knowledge ( epistemic un¬ 
certainty). In a probabilistic setup these parameters are modelled by random 
variables with prescribed joint probability density function, or more generally, 
by random fields. The goal of uncertainty propagation is to assess the effect 
of the input uncertainty onto the model output, and consequently onto the 
performance of the system under consideration (De Rocquigny et al.| 2008 


Sudret, 2007). 


Propagating uncertainties usually requires a large number of repeated calls 
to the model for different values of the input parameters, for instance through 
a Monte Carlo simulation procedure. Such an approach usually require thou¬ 
sands to millions of runs which is not affordable even with modern high per¬ 
formance computing architectures. To circumvent this problem, surrogate 
models may be used, which replace the original computational model by an 


easy-to-evaluate function (Storlie et al. 


2009 

fHastie et al. 

2001 

Forrester 


et ah, 2008). These surrogate models, also known as response surfaces or 


meta-models, are capable of quickly predicting responses to new input real¬ 
izations. This allows for conducting analyses which require a large number 
of model evaluations, such as structural reliability and optimization, in a rea¬ 
sonable time. 

Among the various options for constructing meta-models, this paper fo¬ 
cuses on non-intrusive approaches, meaning that the computational model is 
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considered as a “black-box model”: once an input vector (i.e. a realization 
of the random input in the process of uncertainty propagation) is selected, 
the model is run and provides an output vector of quantities of interest. No 
additional knowledge on the inner structure of the computer code is assumed. 
Popular types of meta-models that may be built from a limited set of runs of 
the original model (called the experimental design of computations) include 


Polynomial Chaos 1 
process modeling ( 

Expansions (PCE) Ghanem and Spanos (2003|), Gaussian 

also called Kriging) Sacks et al. (1989); Rasmussen and 

Williams (2006); Santner et al. (2003); Stein (199 

1), and support vector ma- 

chines Gunn 

(1998) 

Smola and Scholkopf] (2006); 

/azquez and Walter (2003); 

Clarke et al 

(2003 

), which have been extensively investigated in the last 


decade. Polynomial chaos expansions and Kriging are specifically of interest 
in this paper. 

Polynomial chaos expansions (PCE), also know as spectral expansions, 
approximate the computational model by a series of multivariate polynomials 
which are orthogonal with respect to the distributions of the input random 
variables. Traditionally, spectral expansions have been used to solve partial 


differential equations in an intrusive manner Ghanem and Spanos (2003). In 


this setup truncated expansions are inserted into the governing equations and 
the expansion coefficients are obtained using a Galerkin scheme. This pio¬ 
neering approach called spectral stochastic finite element method (SSFEM), 

Xiu and Karniadakis (2002); Sudret et al. ( 2004[ ); Wan 
2006); Berveiller et al. (2006a), among others. These 


was later developed by 


and Karniadakis (2005 


intrusive methods require specific, problem-dependent algorithmic develop¬ 
ments though. Because it is not always possible and/or feasible to treat a 
computational model intrusively, especially when legacy codes are at hand in 
an industrial context, non-intrusive polynomial chaos expansions were devel¬ 
oped. So-called projections methods were developed by [Ghiocel and Ghanem 


(2002); 

Le Maitre et al. 

(2002 

); Keese and Matthies 

(2005 

); 

Xiu and Hes 

thaven 

(2005 

), see a review in 

Xiu 

(2009 

). Least-square minimization tech 


niques have been introduced by Ghoi et al. (2004); Berveiller et al. (2006b); 


Sudret (2008). Further developments which combine spectral expansions and 


compressive sensing ideas have lead to so-called sparse polynomial chaos ex- 


pansions 

Blatman and Sudret 

(2008 

2010a|b) 

; Doostan and Owhadi| 

(2011 

Blatman and Sudret 

(2011 

); 

Eoostan et al. 

2013) 

Jakeman et al. 

(2014 


This is the approach followed in this paper. Further recent applications of 
PCE to structural reliability analysis and design optimization can be found 


m 


Eldred (2009); Eldred et al. (2008); Sarangi et al. (2014). 


The second meta-modeling technique of interest in this paper is Krig¬ 


ing, which originates from interpolating geographical data in mining Krige 


(1951) and is today also known as Gaussian process modeling Rasmussen 


and Williams 

(2006 

); 

Santner et al. (2003 


The Kriging meta-model is in¬ 
terpreted as the realization of a Gaussian process. Practical applications 


can be found in many fields, such as structural reliability analysis Kaymaz 


(2005) 

; Beet et al. 

(2012) 

; Echard et al. ( 

2011); 

Bichon et al. (2008 

; Dubourg 

et al. ( 

2013 

); Dubourg and Sudret 

(2014) 

and design optimization ^ 

ones et al. 

(1998) 

; Dubourg et al. ( 

2011 

); [Dubourg 

(2011 

). The implementation of the 


Kriging meta-modeling technique can be found in e.g. the Matlab toolbox 


DACE Lophaven et al. (2002) and the more recent R toolbox DiceKriging 
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meta-modeling approaches have been applied in 
various fields rather independently. To our knowledge, there has not been 
any attempt to combine PCE and Kriging in a systematic way yet. To bridge 
the gap between the two communities, this paper aims at combining the two 
distinct approaches into a new and more powerful meta-modeling technique 
called Polynomial-Chaos-Kriging (PC-Kriging). As seen in the sequel the 
combination of the characteristics and the advantages of both approaches 
leads to a more accurate and flexible algorithm which will be detailed in this 
paper. 

The paper is organized as follows. PCE and Kriging are first summarized 
in Section and Section respectively. Section introduces the new meta¬ 
modeling approach PC-Kriging as the combination of the two distinct ap¬ 
proaches. The accuracy of the new and traditional meta-modeling approaches 
is compared in Section on a set of benchmark analytical functions. 


Roustant et al. (2012, 2013 


So far the two distinct 


2 Polynomial Chaos Expansions 

2.1 Problem definition 

Consider the probability space {Q,,iF,V), where denotes the event space 
equipped with cr-algebra J- and the probability measure V. Random variables 
are denoted by capital letters X{oj) : O i—?■ Vx C M and their realizations 
denoted by the corresponding lower case letters, e.g. x. Random vectors {e.g. 
X = {X\,... ,Xm]^) and their realizations {e.g. x = {xi,..., xm}"*") are 
denoted by bold faced capital and lower case letters, respectively. 

In this context, consider a system whose behavior is represented by a com¬ 
putational model A4 which maps the M-dimensional input parameter space 
to the 1-dimensional output space, i.e. Xi : x £ Vx C i—)• y G M 

where x = {xi,... ,xm}^■ As the input vector x is assumed to be affected 
by uncertainty, a probabilistic framework is introduced. Due to uncertain¬ 
ties in the input vector, it is represented by a random vector X with given 
joint probability density function (PDE) fx- Eor the sake of simplicity the 
components are assumed independent throughout the paper, so that the joint 
PDE may be written as the product of the marginal PDEs denoted by fxi, 
i = 1,... ,M. Note that the case of dependent input variables can easily be 
addressed by using an isoprobabilistic transform first, such as the Nataf or 


Rosenblatt transform Blatman and Sudret (2010a). The output of the model 


is a random variable Y obtained by propagating the input uncertainty in X 
through the computational model Xi: 


Y = Xi{X). 


( 1 ) 


In this paper we consider that the computational model is a deterministic 
mapping from the input to the output space, i.e. repeated evaluations with 
the same input value xq G Vx lead to the same output value yo = Xi{xQ). 

Provided that the output random variable K is a second-order variable {i.e. 
E [K^] < -boo), it can be cast as the following polynomial chaos expansion 
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( 2 ) 


(PCE) Ghanem and Spanos (2003); [Soize and Ghanem (2004): 
Y = M{X)= Y, 


where {a^, o: G N^} are coefficients of the multivariate orthonormal polyno¬ 
mials ipaiX) in coherency with the distribution of the input random vector 
X, a = {ai, ..., aM} is the multi-index and M is the number of input vari¬ 
ables (dimensions). Since the components of X are independent, the joint 
probability density function fx is the product of the margins fxi- Then a 
functional inner product for each marginal PDF fxi is defined by 


kA 2 )i = / 4>l{x) (j) 2 {x) fxi{x)dx, 

JVi 


( 3 ) 


for any two functions such that the integral exists. For each vari¬ 

able f = 1,..., M an orthonormal polynomial basis can be constructed which 


satisfies Xiu and Karniadakis (2002): 




Pj'\x) Pk’{x) fxi{x)dx = 6jk, 


)(0 


'V, 


( 4 ) 


(i) (i) 

where Pj ', P^ two candidate univariate polynomials in the i-th variable, 
Pi is the support of the random variable Xi and 5jk is the Kronecker delta 


which is equal to 1 for j = k and equal to 0 otherwise. Xiu and Karniadakis 


(2002) summarize various orthonormal bases for some classical PDFs, some 


of which are summarized in Tab. [TJ 


Table 1: Classical orthogonal/orthonormal polynomials (as presented in Sndret 

pOli])) 


Distribution 

PDF 

Orthogonal polynomials 

Orthonormal basis 

Uniform 

1 ]- i , i [( 2;)/2 

Legendre Pk{x) 

Pk{x)/ yj 2 k+l 

Gaussian 

_J_ -X^I2 

Hermite H(.^{x) 

He^{x)/y/k\ 

Gamma 

x°‘e~^l^+{x) 

Laguerre L'^{x) 


Beta 


Jacobi J^’^(x) 

q -2 _ 2 “+*+l r(fc-|-a-|-l)r(fc-|-6+l) 

'^a,b,k 2k+a+b+l r(fc-|-a-|-b-|-l)r(fc+l) 


The multivariate polynomials in Fq. ([^ are then composed of univariate 
polynomials by tensor product, i.e. by multiplying the various polynomials 
in each input variable: 

M 

v^,,(x)=nv'«(x,), (5) 

i=\ 

u\ 

where ifal is the polynomial of degree Oj in the f-th variable. 

The main idea of polynomial chaos expansion (PGF) is then to surrogate 
the computational model by an infinite series of polynomials as shown in 
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Eq. Q. In practice, it is not feasible to handle infinite series, thus the need 
of a truncation scheme. Such a truncation scheme corresponds to a set of 
multi-indexes a ^ A C such that the system response is accurately 
approximated with respect to some error measure |Blatman and Sudret (2010a 

2MT| ): 

Y PS y(PCE) ^ ac'ipo.iX). 

a£A 

There are several ways to select a priori a truncation set A. A simple and 
commonly applied scheme consists in upper-bounding the total degree of poly¬ 
nomials to a maximal value p. The total degree of polynomials is defined by 


( 6 ) 


Q = 


M 

E 

i=l 


ai 


(7) 


In this case the set of multi-indices is denoted by A^’^ = {o: G : |q:| < p} 
where p is the maximal total polynomial degree. The cardinality of the set A 
reads: 

^ {M+p)\ 

I I M!p! ■ ^ ’ 

This cardinality grows polynomially with both M and p. Such a truncation 
scheme thus leads to non tractable problems if the response is highly nonlinear 
in its input parameters (need for a large p) and/or if the size of the input 
vector X is large (say, M > 10). This problem is referred to as the curse of 
dimensionality. 


Blatman and Sudret (2010a); Blatman (2009) proposed a more restrictive 


truncation scheme called hyperbolie truncation set. The authors observed that 
many systems tend to have only low-degree interaction polynomials and thus it 
is not necessary to compute all interaction terms of higher polynomial degree. 
The hyperbolic index set is based on the following q-norm: 


Af’P = {a€N 


M 


a 


<P}, 


where 


I Q: llqr - 



(9) 


( 10 ) 


0 < g < 1 is a tuning parameter and p is the maximal total degree of the poly¬ 
nomials. A decreasing q leads to a smaller number of interactive polynomials, 
i.e. a smaller set of polynomials. When g —)• 0, only univariate polynomials 
are left in the set of polynomials which is called an additive model I Sudret 
(2014). For the sake of illustration, the retained polynomial indices a G Ag 


of a 2-dimensional input space (M = 2) and varying p and q are illustrated 
in Figure The indices denoted by • are part of Aq^'^ and the solid black 
line represents ||Q:||g = p. Note that for q = 1, the hyperbolic index sets are 
equivalent to the total degree index set (see Eq. 0). 


2.2 Computation of the coefficients 

After defining the set of candidate polynomials, the next step is to determine 
the expansion coefficients of each multivariate polynomial ifaix). In this 
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Figure 1: Representation of a hyperbolic index set ck G for various p and q 
(M = 2) 


paper we consider only non-intrusive methods which are based on repeatedly 
evaluating the model M. over a set of input realizations X = ..., 

the so-called experimental design. Different non-intrusive methods have been 
proposed in the last decade to calibrate PC meta-models, namely projection 


Ghiocel and Ghanem 

(2002); 

Le Maitre et al. ( 

2002 

); Keese and Matthies 

(2005), stochastic collocation 

K)iu and Hesthaven ( 

2001 

); Xiu (2009) and least- 

square minimization methods 

Chkifa et al. (20R 

1); J 

digliorati et al. (2014); 

Berveiller et al. (2006t 

)); Blatman and Sudret (2010a, 2011). In this paper 


we adopt the least-square minimization method. The expansion coefficients 
a = {a^, Q: G ^ C N^} are calculated by minimizing the expectation of the 
least-squares residual: 


a = argminE 


y- ac.V’o(X) 
^ a£A 


( 11 ) 


In practice the expectation in Eq. 0 is evaluated by an empirical sample- 
based estimator. DenotingbyT = 

the set of outputs of the exact model Ai for each point in the experimental 
design X, the discretized least-squares error minimization problem derived 
from Eq. © reads 


1 


N 


a = arg min — > T 


;(h _ 


aSRl-^l 


N ^ 

2 = 1 


E 

a&A 


a. 




( 12 ) 


The optimal expansion coefficients a may be computed by solving the linear 
system 


d = {F^Fr^F'^y, 


(13) 
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where F is the information matrix of size x |^| whose generic term reads: 




(14) 


Typically for smooth functions, a small number of polynomials is able to 
represent accurately the output of the computational model. Thus a further 
reduction of the set of predictors in high-dimensional space is possible. Various 
types of generalized regression algorithms have been proposed in the literature, 
namely the least absolute shrinkage operator (LASSO) Tibshiraru] (1996), the 
Least Angle Regression (LAR) Efron et al. ( 2004[ ), low-rank approximations 
Doostan et al. (2013); Peng et al. (2014) and compressive sensing Sargsyan 


et al. (2014). In the applications of this paper, the LAR algorithm is used in 


combination with hyperbolic index sets. 

Given a PCE meta-model, i.e. a set of polynomials “ipct, ct £ A and the 
corresponding parameters a^, the response of a new sample x G Dx may be 
eventually predicted by Eq. (§: 


y(PCE)=A^(PCE)(^) 


(15) 


2.3 Error estimation 


As seen in Eq. ([^, polynomial chaos expansions (PCE) are approximations of 
the exact computational model and thus the prediction at new input samples 
leads to some residual error. That is why error measures are developed to 
quantify the deviation between the exact output y = {y^''\i = 1,..., N} and 
the meta-model output The generalization error (also 

called L^-error) is the expectation of the squared output residuals Vapnik 
(flggsl), i.e. 


Fffgen — IE 


Y - 


y(PCE)y 


(16) 


where y(P*^E) corresponds to the truncated series (see Eq. ([^) and the expec¬ 
tation is defined with respect to the PDF of the input variables JC. If the com¬ 
putational model A4 is inexpensive to evaluate, the generalization error can 
be estimated accurately using an auxiliary validation set X = ..., 

which is sampled from the input distribution fx. The estimate of the gener¬ 
alization error then reads: 


1 ” 2 
Err gen = . (17) 

i=l 


However, this rarely happens in real applications since the very purpose of 
building a meta-model is to avoid evaluating A4 on a large sample set X. Note 
that in Section [^though, the various meta-modeling techniques are compared 
on analytical benchmark functions, making it possible to use such an estimate 
of the generalization error. 

When the use of a large validation set is not affordable, the empirical error 
based on the available experimental design X may be defined: 


T ^ 2 


( 18 ) 



































Normalizing the empirical error by the variance of the output values leads to 
the relative empirical error which is defined as 


(-emp — 


Ef=i 




(19) 


where pLy is the mean value of the output values 3^. The empirical error 
which is based on the experimental design generally underestimates the gen¬ 
eralization error. In particular if the number of polynomials |^| is close to the 
number of samples N in the experimental design, the empirical error tends 
to zero whereas the true (generalization) error does not. This phenomenon is 
called overfitting (in the extreme case where Ai = |^| the predictors may inter¬ 
polate the experimental design points and thus the empirical error vanishes). 
Hence the leave-one-out (LOO) error has been proposed as an estimate of the 


generalization error Stone (1974); Geisser (1975). The general formulation of 


the leave-one-out error is 


N 




( 20 ) 


i=l 


where A4|^^^^^(-) is a PCE model built from the sample set = Xfix^^ = 

j = 1, ■ • •, f — 1, f + 1, • ■ •, N} and y = {y^l ,i = 1,..., N} are the 
response values of the exact computational model. The LOO error is a special 


case of the leave-k-out cross-validation error Allen (1971) which discards k 


samples from the initial experimental design to build up a model and predict 
the error at the k discarded samples. 

In theory the computational cost of the LOO error is proportional to the 
number of samples N since it would require the determination of N PCE 
meta-models corresponding to each experimental design X^~'‘'\ In the special 
case of linearly parameterized regression, which is the case for PCE, it is 
possible to calculate the LOO error analytically without building N separate 


models. The LOO error reads (see e.g. Saporta (2006); Blatman (2009) for 
the proof) 


Err 


where hi is the diagonal term of the matrix F (F' F) ^ F ' and the infor¬ 


(PCE) 

LOO 


1 ^ 
N ^ 


2=1 


I- hi 


( 21 ) 


mation matrix F is defined in Eq. (14). Note that the PCE used in Eq. (21) 


is built only once from the full experimental design X. 


3 Kriging 

3.1 Problem definition 


The second meta-modeling technique in this paper is Kriging, also known 
as Gaussian process modeling, which assumes that the response of a compu¬ 


tational model is a realization of a Gaussian random process Santner et al 
M{x) PC M^'Yx) = (3^f{x)-P Z{x), (22) 


(120031), i.e. 
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where jS f{x) = Ylj=i /3jfj{x) is the mean value of the Gaussian process, 
also called trend, with coefficients /3, cr^ is the Gaussian process variance 
and Z(x) is a zero-mean, unit-variance stationary Gaussian process. The 
zero-mean Gaussian process Z{x) is fully determined by the auto-correlation 
function between two input sample points R{x,x') = R{\x — x'\;6) due to 
stationarity, where 6 are hyper-parameters to be computed. 

Various correlation functions can be found in the literature IRasmussen andl 


Williams (2006); Santner et al. (2003), some of which are the linear, exponen¬ 


tial, Gaussian (also called squared exponential) and Matern autocorrelation 
function. In this paper the Matern autocorrelation function is mainly used 
as it is a generalization of the exponential and the Gaussian autocorrelation 


functions. The general Matern kernel of degree u is defined as Matern (1986) 


M 


R{\x — x'\;l, = ]^ 


1 


i=l 


‘r,0 


Xi 


k 




Xi - X, 


li 


(23) 


where x and x' are two sample points in the input space I = {U > t), i = 
1,..., M} are the scale parameters (also called correlation lengths), u > 1/2 
is the shape parameter, r(-) is the Euler Gamma function and Ku{-) is the 
modified Bessel function of the second kind (also known as Bessel function 
of the third kind). In many publications the shape parameter is set to either 


= 3/2 or 1 / = 5/2 which simplifies Eq. (23) to Roustant et al. (2012): 

M / 


R{\x - x'\;l,v = 3/2) = ]^ 1 + 


i=l 


\/3|xi - x'l 

k 


exp 


\/3|xi - 


M 


R(\x-x'[,l,i' = 5/2) = ]^ 1 + 


\/5 \xi - x'l 5(xi - x') 


2=1 


k 


+ 


31? 


k 


exp 


(24) 


\/5 |xi - 


li 
(25) 

Apart from the correlation part in Eq. (22) there is also a trend part 


P f{x). Three different flavors of Kriging are defined in the literature 


Ras¬ 


mussen and Williams (2006); Santner et al. (2003); Stein (1999), namely sim¬ 


ple, ordinary and universal Kriging according to the choice of the trend. 
Simple Kriging assumes that the trend has a known constant value, i.e. 
P'^fix) = Po- In ordinary Kriging the trend has a constant but unknown 
value, i.e. P = 1, fi{x) = 1 and /3i is unknown. The most general and flexi¬ 
ble formulation is universal Kriging which assumes that the trend is composed 
of a sum of P pre-selected functions fk{x) 


I.e. 


0^ f{x) = '^f3kfk{x), 


(26) 


k=l 


where /3fc is the trend coefficient of each function. Note that simple and 
ordinary Kriging are special cases of universal Kriging. As discussed later in 
this paper, one approach to set up a trend is to use a sparse set of polynomials, 
which defines a new variant of universal Kriging. 


3.2 Calibration of the Kriging model 

Given a value for the auto-correlation hyper-parameters 6, the calibration 
of the Kriging model parameters {P{0),ay{6)} may be computed using an 
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empirical best linear unbiased estimator (BLUE). The optimization yields an 
analytical expression as a function of 9: 


/3(0) = (f'^R-^F) FR'^T 


-1 


aliO) = ^ {y - F13)'^ {y - F P) ^ 


(27) 


(28) 


where y = {y^''\ i = are model responses of the exact com¬ 

putational model on the experimental design X = 

Rjj = — X^^'^\'i9) is the correlation matrix and Fij = fj{x^^^) is the 

information matrix. 

In recent developments, the optimal correlation parameters 6 may be de¬ 


termined by either a maximum-likelihood-estimate (denoted by ML) Marrel 


et al. (2008); Dubourg (2011) or by leave-one-out cross-validation {CV) Ba- 
choc (2013b). The optimal parameters are determined through a minimization 


which reads: 


9 ml = arg min 

9 


{y-F p)'^ R-i {y-Fp) (det r; 


N 


l/iV 


(29) 


9cv = arg min 

e 


T^R(0)-Miag (R(6»)-^) ^R(6»)-^T 


(30) 


The comparison of both approaches shows that ML is preferable to CV in 
well-specified cases, i.e. when the meta-model autocorrelation function fam¬ 
ily is identical to the autocorrelation function of the computational model. 
For practical problems, i.e. assuming a black-box model, the autocorrelation 
function family is not known with certainty. In this case CV shall lead to 


more robust results than ML, as discussed in Bachoc (2013b). 


Determining the optimal correlation parameters in Eq. (29) and (30) is a 
complex multi-dimensional minimization problem. Optimization algorithms 
can be cast into two distinct categories: local and global optimization al¬ 
gorithms. Local methods are usually gradient based algorithms such as the 


quasi-Newton Broyden-Fletcher-Goldfarb-Shanno (BEGS) algorithm Goldfarb 


(1970); Fletcher (1970); Shanno (1970) and its modifications Byrd et al. 


(1999). Global methods are algorithms such as genetic algorithms Goldberg 
(1989) and differential evolution algorithms Storn and Price ( 1997~|Deng 


et al. (2013). The best optimization algorithm is problem dependent and in 


many cases not known a-priori. 

The optimal correlation parameters are then used for predicting the model 
response at new samples of the input space. By assumption, the prediction of 
a Kriging model of a new point jc is a Gaussian random variable with mean 
^y{x) and variance o-|(a;): 


l^yix) = f{x)'^P + r(®)'^R ^ (V - FP ), 


(31) 




1 - {f{x)'^r{x)'^ 


■ 0 

FT ■ 

-1 

’ fix) ■ 


F 

R 


r{x) 



(32) 


where ri(x) = R{\x — is the correlation between the new sample x 

and the sample of the experimental design. The prediction mean is used 
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as the surrogate to the original model A4, whereas the variance gives a local 
error indicator about the precision. It is important to note that the Kriging 
model interpolates the data, i.e. 

l^y{x^%=M{x^^), a|(x«)= 0 , G A'. 

Apart from this procedure to calibrate the meta-model and predict model 
responses, the Kriging meta-modeling technique has been developed further 
in recent years. The latest developments in Kriging are contributed in the as¬ 


pects of optimal estimation of the hyper-parameters Bachoc 


et al. 


kerne’ 


(2014); Bachoc (2013a), the use of adaptive kernels 


(2013b); Bachoc 


Duvenaud et al. 


(2011); Ginsbourger et al. (2013) and the use of additive auto-correlation 


Durrande et al. ( 

2012 

2013) 

Ginsbourger et al. 

(2013 


3.3 Error estimation 


A local error measure for any sample x is given by the prediction variance 

^ 


crg{x) in Eq. (32). This information is useful to detect regions where the 
prediction accuracy is low. Adding new samples to the experimental design 
X in the regions with high prediction variance may lead to an overall increase 
in the accuracy of the meta-model in that region. This characteristics is 
exploited when devising adaptive experimental designs in structural reliability 


( 2011 ). 


analysis, see Echard et al. (2013); Bichon et al. (2008, 2011); Dubourg et al. 


A simple global error measure of the accuracy of the meta-model (such as 


Eq. (19) for PC expansions) is not available for Kriging due to its interpolating 
properties, which make the empirical error vanish (considering no nugget effect 
in its auto-correlation function). Thus one approach to a global error measure 
is the leave-one-out (LOO) error 


1 ^ 9 


(33) 


2 = 1 


where is the prediction mean Hy of sample by a Kriging meta¬ 
model based on the experimental design = A’\x W and T = = 

1,..., A^} is the exact model response. Dubrule (1983) derived an analytical 
solution for the LOO error for universal Kriging without computing the N 


meta-models explicitly in the same spirit as Eq. (21) for PC expansions. The 
prediction mean and variance are given by 


l^y,{-i) 






j=i 


1 


O',- 


(34) 


(35) 


vA-i) R..’ 

where B is a square matrix of size {N-\-P) with N and P denoting the number 
of samples in the experimental design and the number of polynomials in the 
trend part, respectively: 

-1 


B = 


F 

F'^ 0 


(36) 
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where is the Kriging variance for the full experimental design X esti¬ 


mated by Eq. (28). A generalized version of this algorithm called v-fold cross¬ 


correlation error can be found in Dubrule (1983). 


3.4 PCE as a particular case of universal Kriging 

PC expansions can be interpreted as Kriging models where the samples of the 
experimental design are uncorrelated, i.e. where the autocorrelation function 
consists of the Dirac function: 


R{\x — x'\) = 6{x — x'). 


(37) 


The correlation matrix is then the identity matrix of size N, i.e. R = Itv- 
This reduces Eq. (27) to Eq. (13) and Eq. (31) to Eq. ([^. 


Further it can be shown that the leave-one-out error in Eq. (33)-(35) re¬ 


duces to Eq. (21) by the following derivations. Consider the symmetric parti¬ 
tioned matrix C and the corresponding inverse D, i.e. D = which are 
defined as: 


C = 


Cll Ci2 
CJ2 C22 


D 


Dll Di2 
0^2 D22 


where Cn, Du (resp. C 22 ) D 22 ) are square matrices with dimension N (resp. 
P). Using block matrix inversion one can derive: 


Dll = cr/ + cr/ci2 (C 22 - c72cr/ci2 


-1 


pT p-1 


(38) 


In the context of the leave-one-out error, taking C = B in Eq. (36), Cn = 
C 12 = F, C 22 = Op: 

-1 


1 


1 


Dll = H—2^-^^ I Op — F ^IatF 

V (y 


1 


F' 


N 


= —1 + —F -F'F 






-1 


fT = _ ( I _ F f F^F 


-1 


(39) 


Then, the leave-one-out error in Eq. (33) combined with Eq. (34) and the 


above inverse formulation of the B matrix reads: 

2 


Err 


K 

LOO 


‘ E 5''" + E §3'“ - = V E E 


N 


i=l 




i=l 


i=i 


TV / TV 

i=i \ ** i=l 


N 


N 


I - F ( F'^F 


-1 


3;(T 




TV 

E 

^=1 

N 

E 

i=l 


1 - 


(^diag F(FTF)"^FT 


N 

/ -r \ — 1 -r 

y(.) _ ^ 

F(f^f) F^ 

i=i 



yU) 




^ y{i) — 


^/(a«) 


diag 

' H 

1 

1_1 

) 



( 40 ) 
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which is equivalent to the formulation of the leave-one-out error in Eq. (21) 
for the case of PCE and Thus the leave-one-out error in 

PCE can be seen as a special case of the leave-one-out error in the Kriging 
framework. 


4 PC-Kriging 

4.1 Principle 

The characteristic of Kriging is to interpolate local variations of the output 
of the computational model as a function of the neighboring experimental 
design points. In contrast, polynomial chaos expansions (PCE) are used for 
approximating the global behavior of Ai using a set of orthogonal polynomials. 
By combining the two techniques we aim at capturing the global behavior of 
the computational model with the set of orthogonal polynomials in the trend of 
a universal Kriging model and the local variability with the Gaussian process. 
The new approach called Polynomial-Chaos-Kriging (PC-Kriging) combines 
these two distinct meta-modeling techniques and their characteristics. 

Using now the standard notation for truncated polynomial chaos expan- 
we cast the PC-Kriging meta-model as follows 

): 

M(x) K, ^ aaipa{x) a'^Z{x), (41) 

aeA 


sions (see Eq. 


Sudret (2014c 


Schobi and 


where o-a'>Pa{x) is a weighted sum of orthonormal polynomials describ¬ 

ing the mean value of the Gaussian process and A is the index set of the 
polynomials. Z{x) is a zero-mean, unit-variance stationary Gaussian process 
defined by an autocorrelation function R[\x — x'\]0) and is parametrized by 
a set of hyper-parameters 9. 

Building a PC-Kriging meta-model consists of two parts: the determina¬ 
tion of the optimal set of polynomials contained in the regression part {i.e. the 
truncation set A) and the calibration of the correlation hyper-parameters 6 as 
well as the Kriging parameters {cr^, a^}. The set of polynomials is determined 


using the Least-Angle-Regression (LAR) algorithm as in|Blatman and Sudret 


(2011) together with hyperbolic index sets to obtain sparse sets of polynomi¬ 
als. After the set of polynomials is fixed, the trend and correlation parameters 
are evaluated using the universal Kriging equations (Eq. (27)-(30)). 


4.2 Algorithm 

The two distinct frameworks for PCE and Kriging can be combined in various 
ways. In this paper two approaches will be explained in detail, i.e. the Sequen¬ 
tial PC-Kriging (SPC-Kriging) and the Optimal PC-Kriging (OPC-Kriging). 
Both approaches are based on the same input information, namely the ex¬ 
perimental design AT, the corresponding response values y obtained from the 
computational model Ai{X), the description of the stochastic input variables 
(joint PDE fx) and the parametric expression of the an auto-correlation func¬ 
tion R[\x — x'\] 6). The two approaches are defined as follows: 
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Sequential PC-Kriging (SPC-Kriging): in this approach, the set of poly¬ 
nomials and the Kriging meta-model are determined sequentially. The 
assumption behind this procedure is that the optimal set of polynomials 
found by the LAR algorithm in the context of pure PCE can be used 
directly as an optimal trend for the universal Kriging. In a first step the 
optimal set of polynomials is determined using the PCE framework: A is 


found by applying the LAR procedure as in Blatman and Sudret (2011) 


The set of multivariate orthonormal polynomials A is then embedded 
into a universal Kriging model as the trend. The universal Kriging 


meta-model is calibrated using Eq. (27)-(30). 


At the end of the algorithm the accuracy of the meta-model can be 
measured by the leave-one-out error given in Eq. ( p^ or, when using a 
validation set, by the sample-based generalization error in Eq. ©• 

The SPC-Kriging algorithm is illustrated in Eigurej^in which the white 
boxes represent the required input information and the blue boxes rep¬ 
resent the computational tasks. Given a calibrated SPC-Kriging model, 
the response of new input realizations {i.e. the prediction) is computed 


by Eq. (31) and Eq. (32). 



Figure 2: Flowchart for Sequential-PC-Kriging (SPC-Kriging) 


Optimal PC-Kriging (OPC-Kriging): in this approach, the PC-Kriging 
meta-model is obtained iteratively. The set of orthonormal multivariate 
polynomials is determined by LAR algorithm in the same way as in SPC- 
Kriging. Yet the LAR algorithm results in a list of ranked polynomials 
which are chosen depending on their correlation to the current residual at 
each iteration in decreasing order. OPC-Kriging consists of an iterative 
algorithm where each polynomial is added one-by-one to the trend part. 
In each iteration, the coefficients of the trend and the parameters of 
the auto-correlation function are calibrated. In the end, a number |^| of 
different PC-Kriging models are available. The |^| meta-models are then 


compared in terms of the LOO error (Eq. (33)). The optimal PC-Kriging 


meta-model is then chosen as the one with minimal leave-one-out error. 
Figure illustrates the OPC-Kriging algorithm in a flowchart. The 
notation means a universal Kriging meta-model where the 

trend is modeled by the Q first polynomials selected in A according to the 
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ranking obtained by LAR. Note that the red box represents the universal 
Kriging model which eventually minimizes the LOO error {Q = P*) and 
which is thus finally chosen. The final box marks the prediction of new 
model responses which is computed by Eq. (31) and Eq. (32). 



Optimal PC-Kriging 
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Figure 3: Flowchart for Optimal PC-Kriging (OPC-Kriging) 


As a summary PC-Kriging can be viewed as a universal Kriging meta¬ 
model with a non-standard trend part. Thus the error estimates from Kriging 
in Section |3.3| are valid here with no modification. In particular, the LOO 


error in Eq. (33) is computed to compare different PC-Kriging models and 
to compute the optimal meta-model in OPC-Kriging. The performance of 
both approaches and the comparison to the traditional PCE and Kriging 
approaches is now illustrated for a variety of benchmark analytical functions. 
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5 Analytical benchmark functions 


5.1 Setup 

Easy-to-evaluate functions are tested to verify and validate the new PC- 
Kriging approach by comparing the meta-model to the exact model response. 
In this paper, six analytical functions of different dimensionality are illus¬ 
trated, namely four with uniformly distributed input variables (i.e. Ishigami, 
Sobol’, Rosenbrock and Morris functions) and two with Gaussian input vari¬ 
ables (i.e. Rastrigin and O’Hagan function). References to the original use of 
the benchmark functions in the literature are given below. 

The first four analytical functions use uniformly distributed random vari¬ 
ables in the input space. The Ishigami function is a smooth function with 
three independent input parameters commonly used for benchmarking meth¬ 
ods in global sensitivity analysis. 

fi{x) = sinxi -|- 7 sin^ X 2 -|- 0.1 X 3 sinxi, (42) 


where Xi ~ vr,vr), i = 1,2,3. The SohoV function is also well-known 
sensitivity analysis because the Sobol’ indices are easy to derive analytically 


Sobol’ (1993): 


f2ix) = 


|4xi -2\+Ci 

1 + Cj 


(43) 


where Xj ~ ZY(0,1), z = 1,..., 8 and c = (1, 2, 5,10, 20, 50,100, SOO)"*" as in 


Sudret (2008). Due to the absolute value operator in the enumerator the func¬ 


tion behaves non-smoothly at the point Xi = 0.5. The Rosenbrock function is 
a polynomial function with a 2-dimensional input space Rosenbrock (1960|: 


fsix) = 100 {x2 - xf)'^ + (1 - xiY , 


(44) 


where Xi ~ U{—2,2), i = 1,2. The last function considered is the Morris 
function which is defined by Morris (1991) 

20 20 20 

fYx) = '^PiWi + '^ Pij WiWj -h ^ I3iji WiWjWi + 5 wiW2W3Wi, (45) 

i=l i<j i<j<l 

where Xj ~ ZY(0,1), z = 1,..., 20 and Wi = 2 {xi — 1/2) for all i except for 

1 1 Xi 

i = 3,5,7 where Wi = 2(-1/2). The coefficients are defined as: 

X T 0.1 

/3i = 20, i = l,.. . ,10; /3ij = -15, i, j = 1,... ,6; Piji = -10, i,j,l = 1,... ,5. 
The remaining coefficients are set equal to /3j = (—1)* and flij = (—l)*"''-^ 


as 


m 


Blatman (2009). 


Two other benchmark functions of independent Gaussian variables are 
also studied. The Rastrigin function has a two-dimensional input space and 
is defined by Rastrigin (1974) 


/ 5 (x) = 10 - ^(Xi - 5 cos(27rxi)). 


(46) 


2=1 
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where Xi ~ AA(0,1), i = 1 , 2 . The last function is the O’Hagan function 
which is dehned bylOakley and O’Hagan (2004) 


fe{x) = ai^x + a2^ sin(s) + cos(a;) + x^Qx, 


(47) 


where Xi ~ M{0, 1), i = 1,..., 15. The vectors {ai, a 2 .a 3 } and matrix Q are 


defined in Oakley and O’Hagan (2004). 


Note that the functions fi — have uniform input random variables. Ac¬ 
cordingly the PC trend in PC-Kriging is built up from multivariate Legendre 
polynomials. In contrast f^, fe have Gaussian input random variables. Thus 
the PC trend is modeled by multivariate Hermite polynomials (see Tab. [^. 


5.2 Analysis 

At the beginning of each algorithm the experimental design is generated 
with the Latin-hypercnbe sampling technique McKay et al. (1979). Then 


the meta-modeling is processed applying the four previously discussed meta¬ 
modeling techniques, i.e. ordinary Kriging, PCE, SPC-Kriging and OPC- 
Kriging. Note that for the Kriging meta-models, the maximum likelihood 


formulation (Eq. (29)) in combination with the gradient based BEGS opti¬ 


mization algorithm is used in order to compute the optimal correlation pa¬ 
rameters. 

Their performance is compared by means of the relative generalization 
error which is defined as the ratio between the generalization error (Eq. ®) 
and the output variance: 


E 




(y _y(PCE)y 


Var [K] 


(48) 


The error is estimated here using a large validation set X = 
of size n = 10 ^, which results in 


^gen 


Er.,(vi(xi‘>)-y((ic«))’ 

E”.i 


( 49 ) 


where Hy is the mean value of the set of exact model responses over the valida¬ 
tion set Y = ..., ..., A4(a;^"'^)}. Eor all Kriging-based 

approaches, the meta-model Ai(x) is the prediction mean fJ.y{x). Note that 
the samples in X follow the distribution of the input variables X in order to 
obtain a reliable error estimate. 

For each experimental setup, the analysis is replicated to account for the 
statistical uncertainties in the experimental design. 50 independent runs of 
the full analysis are carried ont and the results are represented using boxplots. 
In a box plot the central mark represents the median value of the 50 runs, 
the edges are the 25th and 75th percentile denoted by 525 and ^ 75 . The 
whiskers describe the boundary to the outliers. Outliers are defined as the 
values smaller than 525 — 1.5 (^75 — 525 ) or larger than ^75 -|- 1.5 (^75 — ^ 25 )- 
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5.3 Results 


5.3.1 Visualization of PC-Kriging’s behavior 

The different types of meta-models are illustrated in Fig. which shows a 
2-dimensional contour plot of the output of the Rastrigin function (N = 
128 samples) (Fig. [4(A) ) and its approximations by PCE, ordinary Kriging 
and PC-Kriging (Fig. |4(^ 1(11) )• The Rastrigin function has a highly oscil¬ 
latory behavior on the entire input space as seen in Fig. |4(A)[ This behavior 
is difficult to meta-model with a small number of samples because many local 
minima/maxima are missed out. 

The analytical formulation of the Rastrigin function is a combination of 
a quadratic component and a high-frequency trigonometric component. The 
PCE model in Fig. |4(C)1 captures the global characteristic of the function, i.e. 
the quadratic component, whereas the ordinary Kriging model in Fig. |4(B)| 
approximates the local characteristics, i.e. the high-frequency trigonometric 
component. Finally, the combination of PCE and Kriging leads to a better 
meta-model as shown in Fig. |4(D)[ 

Note that the meta-models in Fig. have a high accuracy around the 
origin of the coordinate system due to the definition of the input vector PDF 
as standard normal distributions (V AA(0,1)). 



(A) Rastrigin function 



(B) Ordinary Kriging 


(C) PCE 



(D) PC-Kriging 


Figure 4: Rastrigin function - Visual composition of PC-Kriging 


5.3.2 Small experimental design 

The four meta-modeling techniques are compared for the six analytical func¬ 
tions using experimental designs of increasing size. The number of samples 
is chosen so that it yields a large range of relative generalization errors on 
the second axis. The results are illustrated in Fig. [5 10 In each figure (A) 
shows the ordinary Kriging model and (B) shows the PCE model. The new 
approaches SPC-Kriging and OPC-Kriging are shown in (C) and (D) respec¬ 
tively. 
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Figure 5: Ishigami function - Relative generalization error (Eq. (49)) for the various 
meta-modeling approaches 


Figure shows the relative generalization error for the various meta¬ 
modeling techniques for the Ishigami function. For a small sample size of 
N = 20 samples, ordinary Kriging performs best with respect to the median 
value of the box plots. SPC-Kriging and OPC-Kriging perform worse due to 


the effect of overfitting (see also Section 2.3). The number of parameters to 


be estimated in the case of PC-Kriging is larger than in the case of ordinary 
Kriging due to the number of polynomials in the trend of the Kriging model. 
Thus, OPC-Kriging (and also SPC-Kriging) are more prone to overfitting 
than ordinary Kriging for small experimental designs. When the number of 
samples is increased, however, the two PC-Kriging approaches perform better 
than the traditional approaches because their median value and their varia¬ 
tion of the error are lower. For the large sample sizes {N > 50), PC-Kriging 
performs similarly to PCE, though slightly better. OPC-Kriging is slightly 
more accurate than SPC-Kriging over the whole range of sample sizes. 

Figure [^presents the results of the Rosenbrock function, which is a purely 
polynomial function and can be modeled accordingly with a small number of 
polynomials based on a small number of points in the experimental design 
{e.g. in the case of PCE). This is the reason why the number of points lies 
within N = 8 ,..., 20. Highly accurate surrogate models are obtained with 
only 20 samples. Eor small sample sizes OPC-Kriging performs best among 
the four techniques in terms of the relative generalization error. 

The Sobol’ function is more complex than the two previous functions be¬ 
cause of the dimensionality (M = 8) and the non-smooth behavior at Xi = 0.5. 
Thus more samples are needed to obtain a similar range of relative generaliza¬ 
tion errors compared to the previous functions, as seen in Fig. Behaviors 
for the larger sample sizes {N = 64, 128) are very similar among the meta- 
modeling approaches, although PC-Kriging performs slightly better than the 
traditional PCE and ordinary Kriging approaches. For very small sample sizes 
(N = 16, 32), OPC-Kriging performs signihcantly better than the others. 

Figure shows the results for the Morris function. A large experimental 
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Figure 6: Rosenbrock function - 
various met a-mo deling approaches 


Relative generalization error (Eq. (49)) for the 
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Figure 7: Sobol’ function - 
meta-modeling approaches 


Relative generalization error (Eq. (49)) for the various 


design is required to properly surrogate the computational model because of 
the high dimensionality of the input vector X and the amount of interac¬ 
tive terms of different input variables Xi in the analytical formulation (see 
Eq. (45)). The relative generalization error of the two PC-Kriging approaches 
resembles more the one of ordinary Kriging than the one of PCE in this case. 
PCE is not capable of modeling this analytical function with a small number 
of samples. 

The results associated with the Rastrigin function are shown in Eig.[^ De¬ 
spite the low dimensionality of the input (M = 2), many samples are needed 
to obtain small error estimates. This is due to the fact that the function out¬ 
put is highly oscillatory over the entire input space as previously illustrated 
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Figure 8: Morris function - Relative generalization error (Eq. (|4^) for the 
meta-modeling approaches 
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Figure 9: Rastrigin function - Relative generalization error (Eq. (49)) for the 
various meta-modeling approaches 


in Fig. In comparison to Section 5.3.1 which describes the qualitative 


performance of PC-Kriging on the Rastrigin function, the quantitative ben¬ 
efit of combining PCE and Kriging becomes visible in Fig. PC-Kriging 
performs better than the traditional approaches. Ordinary Kriging performs 
the worst followed by PCE. OPC-Kriging has statistically the lowest relative 
generalization errors over the whole range of experimental design sizes. 

Figure [^displays the results associated with the O’Hagan function. Sim¬ 
ilarly to the Morris function, the performance of PC-Kriging in the case of the 
O’Hagan function resembles that of ordinary Kriging whereas PCE performs 
worse than the other three approaches. Over the entire displayed range of 
experimental designs in Fig. pR} the performance of OPC-Kriging is slightly 
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Figure 10: O’Hagan function - Relative generalization error (Eq. (49)) for the 
various meta-modeling approaches 


better than the performance of SPC-Kriging and ordinary Kriging. Note that 
meta-modeling the O’Hagan function requires less samples in the experimen¬ 
tal design to obtain the same accuracy as in the case of the Rastrigin function 
despite the fact that the O’Hagan function has a 15-dimensional input space 
and that both functions are very smooth. 

Summarizing the results of all six analytical functions in Fig. [5]|10[ the 
proposed PC-Kriging approaches perform better than or at least as good as 
the traditional PCE and Kriging approaches. Note that for functions like 
O’Hagan (Fig. 10) and Morris (Fig.[^ the performance of PC-Kriging is more 
similar to Kriging than PCE, whereas for the other functions the performance 
of PC-Kriging resembles more that of PCE. As one could expect, there is no 
general rule so as to decide whether PCE or Kriging provide the most accurate 
meta-models for a given experimental design. The advantage of PC-Kriging 
is to perform as least as well as the best of the two. 

The combination of PCE and Kriging and its increased accuracy comes 
with a higher computational cost. The traditional ordinary Kriging and PCE 
approaches have the lowest computational cost, SPC-Kriging has an interme¬ 
diate and OPC-Kriging has the highest cost. The high cost of OPC-Kriging 
originates from the iterative character of the algorithm and the accompa¬ 
nying repetitive calibration of Kriging models. OPC-Kriging computes EAR 
and calibrates P Kriging models with increasing complex trend, whereas ordi¬ 
nary Kriging consists of a single calibration of a Kriging model with constant 
trend. For a single calibration of a surrogate of the Ishigami function (ex¬ 
perimental design of size N = 128 samples) the ratio of computational times 
when comparing PCE to ordinary Kriging, SPC-Kriging and OPC-Kriging is 
approximately 1 : 5, 1 : 20 and 1 : 200, respectively. 

Note that it is intended to apply these techniques to realistic problems 
where the evaluation of the exact computational model response lasts much 
longer than the computation of a meta-model. The apparent computational 
overload of OPC-Kriging will not be anymore an issue in many practical 
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applications. 


5.3.3 Large experimental designs 


When the resources for experiments are limited the focus lies on doing as few 
computational model runs as possible as discussed in the previous section. 
In order to describe the entire behavior of the meta-modeling approaches, 
the Ishigami function is also studied now for larger experimental designs in 
order to assess the convergence of the various schemes. Results are shown in 
Fig.[TTl This figure illustrates the evolution of the relative generalization error 
from small to large sample sizes on the logarithmic (base 10) scale. The error 
measure decreases fast when enlarging the sample set because the Ishigami 
function is composed of sine and cosine functions which can be approximated 
well with series of polynomials, e.g. as in a Taylor expansion. Thus PCE is 
the dominating effect for PC-Kriging. 

For large sample sizes, ordinary Kriging is outperformed by the other 
three approaches. Kriging in general works well with small sample sizes. If 
too many samples are used, the interpolation algorithm becomes unstable due 
to singularities and bad conditioning in the auto-correlation matrix. If a large 
sample size is available, regional Kriging models on a subset of samples, e.g. 


the neighboring samples, are more suitable Dubrule (1983). 
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Figure 11: The Ishigami function - Relative generalization error (Eq. (@) associ¬ 
ated with large experimental designs for the various meta-modeling approaches 


The performance of PC-Kriging and also PCE for a large number of sam¬ 
ples (here 128 and 256 samples in Fig. 0 is in the order of magnitude of 
the machine computation precision. Errors around e^en ~ 10“^^ originate 
from numerical round-off errors which are not reducible by adding more sam¬ 
ples. It is questionable though, whether in reality such a high accuracy in the 
meta-model prediction is needed. 
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Figure 12: Evolution of the leave-one-out error (LOO) and relative generalization 
error inside the OPC-Kriging algorithm as a function of the number of polynomials 
in the regression part 


5.3.4 Evolution of the error measures 


The OPC-Kriging algorithm includes the tracking of the LOO error to opti¬ 
mally choose the sparse set of orthonormal polynomials. The evolution of the 
LOO error for the Ishigami function and a sample size of At = 128 samples is 
presented in Fig. 12 The experimental-design-based LOO error (dashed, red 


line) is compared to the relative generalization error which is shown as the 
solid black line. 

The first point to notice is that the LOO error slightly under-predicts the 
true value of the relative generalization error for all sizes of polynomial sets. 
This is due to the fact that the LOO error is based solely on the information 
contained in the experimental design samples whereas the relative general¬ 
ization error is based on a large validation set (n = 10®). Although there is 
an inherent under-prediction, the overall behavior of the two error measures 
is similar. Thus the choice of the optimal set of polynomials for the OPC- 
Kriging can be based on the LOO error, which is obtained as a by-product 
of the procedure used to ht the parameters of the PC-Kriging model. In the 
example case of Fig. 12, choosing only half of the polynomials, i.e. P = 27 


leads to a meta-model which is almost as accurate as using all 56 polynomials. 
The optimal set of polynomials can be chosen at the point where the decrease 
in LOO error becomes insignihcant. This reduces the number of polynomials 
needed and thus also reduces the complexity of the OPC-Kriging meta-model. 


6 Conclusion 

In the context of increasing computational power, computational models in 
engineering sciences have become more and more complex. Many analyses 
such as reliability assessment or design optimization, require repeated runs 
of such computational models which may be infeasible due to resource limi- 
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tations. To overcome these limitations computational models are nowadays 
approximated by easy-to-evaluate functions called meta-models. 

This paper summarized the principles of two popular non-intrusive meta- 
modeling techniques, namely Polynomial Chaos Expansions (PCE) and Krig- 
ing (also called Gaussian process modeling). Then the combination of the 
two approaches to a new meta-modeling approach called Polynomial-Chaos- 
Kriging (PC-Kriging) is proposed. Two formulations of PC-Kriging are intro¬ 
duced in this paper, namely Optimal-PC-Kriging (OPC-Kriging) and Sequential- 
PC-Kriging (SPC-Kriging). SPC-Kriging employs first a least-angle-regression 
(EAR) algorithm to determine the optimal sparse set of orthonormal polyno¬ 
mials in the input space. Then, the set of polynomials is used as the trend 
of a universal Kriging meta-model. OPC-Kriging employs the same least- 
angle-regression algorithm as SPC-Kriging, yet iteratively adds polynomials 
to the trend part of the universal Kriging model one-by-one, and fit the hy¬ 
perparameters of the auto-correlation function in each iteration. In this case 
polynomials are added to the trend in the order they are selected by the EAR 
algorithm. Based on the EOO error the best meta-model is found to be the 
OPC-Kriging meta-model. 

The performance of the four approaches (ordinary Kriging, PCE, SPC- 
Kriging, OPC-Kriging) is compared in terms of the relative generalization 
error on benchmark analytical functions. The results show that PC-Kriging 
is better than, or at least as good as the distinct approaches for small exper¬ 
imental designs. Specifically, OPC-Kriging is preferable to SPC-Kriging as it 
reduces the number of polynomials in the regression part and thus reduces 
the complexity of the meta-model, at a computational calibration cost which 
is however higher than that of SPC-Kriging. 

The analysis of the performance of PC-Kriging is limited to some bench¬ 
mark analytical functions in this paper. The ongoing research applies PC- 
Kriging to realistic engineering problems such as reliability analysis or design 
optimization. The idea of adaptive experimental designs (also called design 
enrichment) is introduced in order to increase the accuracy of the surrogate 
in some specific regions of the input space {e.g. close to the zero-level of the 
limit state function in reliability analysis) instead of everywhere. The ini¬ 
tialization of the iterative algorithm is a small initial experimental design to 
which points are added in regions of interest. These added points will then in¬ 
crease the quality of the meta-model specihcally in those regions. Preliminary 
ideas developed in Schobi and Sudret ( |2014b|c|aD are currently investigated 
in details. 
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